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Summary of Review 

The Center for Research on Education Outcomes (CREDO) at Stanford University 
conducted a large-scale analysis of the impact of charter schools on student performance. 
The center’s data covered 65-70% of the nation’s charter schools. Although results varied 
by state, 17% of the charter school students have significantly higher math results than 
their matched twins in comparable traditional public schools (TPS), while 37% had sig- 
nificantly worse results. The CREDO study strengthens the well-established, broader 
body of evidence showing average charter performance to be equal to, or perhaps lower 
than, the performance of traditional schools — a body of evidence that is summarized in 
this review. The study also presents some state-level analyses concerning policy options; 
this review points out limitations with those analyses and also explores other policy im- 
plications of the report’s findings. The relative strength and comprehensiveness of the 
data set used for this study, as well as the solid analytic approaches of the researchers, 
makes this report a useful contribution to the charter school research base. Nevertheless, 
this review points out some weaknesses and areas for improvement, many of which rep- 
resent commonplace limitations for this type of study that should be shared in the techni- 
cal report. 




Review 



I. Introduction 

In recent years, several attempts have been 
made to draw overall conclusions regarding 
charter school performance from multiple 
states or from multiple studies.' These ef- 
forts seek to inform broader policy decisions 
regarding whether charter school policies 
are likely to generate schools that are more 
effective than traditional public schools. The 
new study from the Center for Research on 
Education Outcomes (CREDO) at Stanford 
University, Multiple Choice: Charter School 
Performance in 16 States,^ goes beyond ear- 
lier efforts in its attempt to answer the 
broader policy question about the relative 
performance of charter schools. 

Charter schools, by design, receive more 
autonomy in operations; in exchange they 
are to be held more accountable than other 
public schools for student outcomes. Charter 
leaders use this autonomy to create their 
own schools, select their own governing 
boards, design educational interventions ap- 
propriate for students’ unique needs and 
learning styles, and hire and fire teachers 
more freely. In turn, the enhanced autonomy 
granted to charter schools was expected to 

result in, among other things, greater per- 

-2 

formance of students enrolled in them. 

Performance accountability was to be ef- 
fected through market mechanisms (with 
funding following students) and contractual 
relationships with authorizers. If parents 
were not satisfied they would not enroll or 
they would choose to leave, which could 
eventually bankrupt the school. If the au- 
thorizer (e.g., a state board of education or a 
local school district) did not believe the 
school was living up to its mission and 
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agreed-upon contract, the contract or charter 
could be revoked or not renewed after it ex- 
pired. When poorly performing charter 
schools are removed from the ranks, the ag- 
gregate results of the remaining charter 
schools would go up. (In contrast, one 
would expect average results from tradi- 
tional public schools, which are not easily 
closed, to be weighted down by a number of 
poorly performing schools.) These sorts of 
accountability mechanisms also suggest why 
charter schools were, on average, expected 
to outperform traditional public schools. 

Research on charter performance, such as 
the new CREDO study, is intended to evalu- 
ate whether or not charter policies are fulfill- 
ing policymakers’ intentions. The scope and 
relative rigor of the CREDO study rein- 
forces the larger body of evidence, which 
shows no overall impact of charter schools 
on performance. 

II. Findings and Conclusions 
OF THE Report 

bindings of the CREDO report are stated to 
be based on a rigorous analysis of more than 
70% of the nation’s students attending char- 
ter schools. The key findings include the fol- 
lowing: 

• Charter school students on average saw a 
decrease in their academic growth in 
reading by 0.1 standard deviations and 
0.3 standard deviation units for math. 
These decreases are small but statisti- 
cally significant. 

• Eor example, 17% of the charter school 
students have significantly higher math 
results than their matched twins in com- 
parable traditional public schools (TPS), 
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while 37% had significantly worse re- 
sults. Forty-six percent of the charter 
school students had math gains that were 
statistically indistinguishable from the 
average growth among the comparison 
TPS students. 

• 5 states had significantly higher learning 
gains for charter school students than 
would have occurred in traditional 
schools: Arkansas, Colorado (Denver), 
Illinois (Chicago), Louisiana, and Mis- 
souri. 

• 6 states had significantly lower average 
charter school student growth than their 
TPS matched peers: Arizona, Florida, 
Minnesota, New Mexico, Ohio, and 
Texas. 

• 4 states had results that did not differ 
from the traditional public schools: Cali- 
fornia, District of Columbia, Georgia, 
and North Carolina. 

• States with caps on the number of char- 
ter schools tended to have lower aca- 
demic growth, as did states that had mul- 
tiple authorizers. Evidence also sug- 
gested that states with an appeal process 
to overturn denied charter applications 
had a small but significant advantage in 
terms of growth in student achievement. 

• Charter students in elementary and mid- 
dle school grades had small but signifi- 
cantly higher rates of learning than their 
matched peers in traditional public 
schools, but students in charter high 
schools and charter multi-level schools 
had significantly worse results. 

• Gains in achievement scores were lower 
for African American and Hispanic stu- 
dents enrolled in charter schools than for 
their matched peers in traditional public 
schools. 

• Charter schools were found to have 
slightly better academic growth results 
for students in poverty and for students 
who are classified as English Language 
Learners."^ 



• Students in special education programs 
had similar outcomes as their matched 
peers in traditional public schools. 

• Charters in their first year showed results 
that ranged from poor to very poor.^ 

III. The Report’s Rationale eor 
ITS Findings and Conclusions 

This is an empirical report. The findings and 
conclusions are based on a longitudinal stu- 
dent-level data set created by the researchers 
at CREDO. This dataset reportedly includes 
student-level data from 16 states (including 
D.C.), although Illinois and Colorado data 
are comprised of charter school students in 
only Chicago and Denver, respectively. 

Although full details on the methods are not 
included, it is apparent that conclusions are 
based solely on the findings from the analy- 
sis of the student-level data set. 

Conclusions about policy implications are 
based on a more rudimentary classification 
of states, with regard to (i) whether they 
have caps on the numbers of charter schools, 
(ii) whether they have multiple authorizers, 
and (iii) whether they have an appeal proc- 
ess. A discussion of these policy implica- 
tions is included later in this review. 

IV The Report’s Use oe 
Research Literature 

The contents of the report focus on its new 
findings. Minimal attention is given to the 
existing body of literature on student 
achievement in charter schools, and little 
effort was made to link the new findings to 
the body of evidence. 

The research base on charter schools has 
clearly improved over time. In the mid- 
1990s, this research largely focused on start- 
up issues and the degree to which these 
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schools were innovative or the extent to 
which they promoted racial integration or 
segregation. By the end of the 1990s, more 
of the evaluations of charter schools ad- 
dressed student achievement. 

In 2001, Miron and Nelson^ prepared a syn- 
thesis of evidence on student achievement 
and charter schools and found 15 studies of 
charter school achievement across 8 states. 
Half of these were of limited scope, meaning 
they included few schools and years of data. 
Using similar selection criteria for studies in 
2008, Miron, Evergreen, and Urschel iden- 
tified and synthesized the evidence from 47 
studies. While this represented a large in- 
crease in the number of studies, many of 

o 

these were still of limited scope. 

National Studies of Student Achievement 
in Charter Schools 

The press release for the CREDO study 
makes the claim that this is the first national 
study of charter school impact. Although the 
claim is overstated, there is an element of 
truth. Prior to the CREDO study, there were 
a number of studies that truly encompassed 
charter results across the nation (see, for ex- 
ample, Nelson, Rosenberg and Van Meter, 
2004; Hoxby, 2004; Eubienski & Eubienski, 
2006; Braun, Jenkins & Grigg, 2006).^ 
However, all of these national studies were 
based on cross-sectional designs, which 
meant they could describe the relative status 
or performance of charter schools, but they 
could not measure impact or change over 
time. Except for the Hoxby report, which 
used school level data from state assessment 
programs, these studies used student-level 
data from the National Assessment of Edu- 
cational Progress (NAEP). Also except for 
Hoxby, all of these national studies found 
that charter school students had test results 
similar to or worse than comparable tradi- 
tional public school students. The results 



from the CREDO study are similar to the 
earlier national studies, although the 
CREDO study goes beyond these earlier 
ones by using student-level data, with a 
large (almost national) scope, to look at 
changes in charter schools over time.^° 

Multi-State Studies of Charter Schools 

The CREDO study is a large, multi-state 
study that claims to cover 15 states plus the 
District of Columbia (DC). In fact, the re- 
sults indicate that Massachusetts was left 
out, and thus only 14 states and DC are in- 
cluded.'^ 

There have been earlier efforts to combine 
findings from multi- state studies. Using 
school-level data. Loveless (2003)'^ com- 
bined achievement results in 10 states and 
compared these with traditional public 
schools. Miron, Coryn, & Mackety (2007)'^ 
similarly used school-level data to conduct a 
six-state study of the Great Lakes states, 
which covered one-quarter of the nation’s 
charter schools. The Loveless study used Z 
scores, similar to the CREDO study, to 
combine and compare across states. The 
Great Lakes study used residual scores. The 
Loveless study found that charter schools 
were performing at lower levels than were 
traditional public schools, but they were im- 
proving more rapidly than traditional public 
schools over time. The Great Lakes study 
had similar results. However, that study also 
found that as performance levels of charter 
schools in more mature states improved and 
approached the level of demographically 
similar traditional public schools, they 
tended to level off and remain similar to the 
performance level of traditional public 
schools. Only in Illinois did the charter 
schools performance level eventually sur- 
pass their comparison group. 

The results from the CREDO study are simi- 
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lar to those from these earlier multi-state 
studies, although the CREDO study goes 
beyond these earlier studies because it is 
based on student-level data rather than 
school-level data. This is an important qual- 
ity distinction. 

Longitudinal Studies with 
Student-Level Data 

The researchers at CREDO have made an 
exceptional effort to secure student-level 
data from so many states. Most past studies 
of student achievement in charter schools 
employ longitudinal designs with school- or 
group-l&ytl data.'"^ Although these studies 
are the most prevalent, they are not as reli- 
able as studies that can examine impact 
based on student-level data. Since 2001, 
however, an increasing number of charter 
school studies are based on individual stu- 
dent data, which can link test results for stu- 
dents over time to measure gains or changes 
in performance. Student-level data allow 
analysts to match charter and non-charter 
students on a set of demographic character- 
istics and then track their relative perform- 
ance over time. The first study of student 
achievement in charter schools that used a 
matched student design focused on Arizona 
and was completed in 2001 (see Solmon, 
Paark, & Garcia, 2001).'^ Since then several 
studies have been based on student-level 
data, including ones for California (Zimmer, 
et ah, 2003), Delaware (Miron, et ah, 2007), 
Idaho (Ballou, Teasley, & Zeidner, 2008). 
Elorida (Sass, 2006), North Carolina (Bil- 
fulco & Ladd, 2006), and Texas (Gronberg 
& Jansen, 2005).^^ 

A more recent study by RAND (2009) used 
student-level data and covered eight states.'^ 
Matched student designs are the most prom- 
ising development in research on student 
achievement in charter schools. The costs 
are reasonable low and one can conduct 



large-scale studies with relatively strong 
controls. The new CREDO study is a posi- 
tive illustration of what can be done when 
states are willing to grant researchers access 
to student-level data sets. 

Randomly Controlled Experiments 

Although the CREDO study is both rigorous 
and comprehensive, there are studies that 
have employed designs with, at least in 
terms of internal validity, even greater rigor. 
These include a number of smaller-scale 
studies that have simulated random assign- 
ment by creating control groups from admis- 
sions wait lists (see Hoxby and Rockoff, 
2004, Hoxby, Muraka, 2007, and Abdul- 
kadiroglu et ah, 2008).'^ These studies have 
shown positive effects for the particular 
charter schools studied, although they lack 
external validity since they include a small 
number of the higher-performing charter 
schools, with sufficiently large waiting lists, 
that are willing to participate in such stud- 
ies. That is, the results cannot validly be 
generalized to less-popular charter schools. 

A true randomly controlled experiment will 
be difficult and expensive and will come with 
its own set of obstacles. However, the U.S. 
Department of Education has indeed funded a 
large and expensive randomly controlled 
study of student achievement in charter 
schools, which is being conducted by 
Mathematica Policy Research Inc. This study 
will only cover 40 charter schools, but comes 
with a price tag of more than $5 million.^' 
Although randomly controlled experiments 
are commonly perceived as the gold standard 
for study design, they are very difficult to 
implement when evaluating policies as broad 
as charter school reforms. The fact that not a 
single such study has yet been successfully 
completed on charter schools underlines the 
difficulty involved in implementing such de- 
signs when evaluating large-scale reforms. 
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Until we know whether the Mathematica 
study can be successfully completed, the 
CREDO study stands as the relatively most 
rigorous and comprehensive study of student 
achievement in charter schools. 

Meta-analyses and Syntheses of Evidence 

A number of efforts have been made to syn- 
thesize evidence across studies. In 2001, 
Rand researchers summarized the evidence 
across 3 studies, and Miron and Nelson 
summarize the evidence across 12 studies of 
student achievement in charter schools. 
More recent summaries have been prepared 
by Carnoy et al. (2005), Hill (2006), and the 
National Alliance for Charter Schools 
(2009).^^ The later three examples grouped 
studies depending on whether they were 
positive or negative, and in some cases they 
are grouped by design type, although no ef- 
fort has been made to weigh and synthesize 
the results across studies. 

The most rigorous attempt to synthesize the 
evidence on charter school impact was un- 
dertaken by Betts and Tang (2008). They 
used meta-analysis methods to combine and 
synthesized findings across 14 studies that 
covered 7 states and 2 school districts. They 
only included more rigorous studies that 
used student-level data. The median effect 
size across these studies was barely distin- 
guishable from zero (0.005). A recent effort 
by researchers at Western Michigan Univer- 
sity^^ synthesized the evidence across 47 
studies. The findings across the states were 
distinguished by quality, scope, and the na- 
ture of their impact. Overall, 19 studies had 
positive findings, 12 studies had mixed find- 
ings, and 16 had negative findings. The 
mean impact rating for charters was indis- 
tinguishable from zero. 

We have included this short review of the 
literature to supplement the CREDO report 



and to illustrate that although results vary 
within and across states, the overall answer 
for policymakers regarding the impact of 
charter schools was well established prior to 
the new report and was indeed reinforced by 
the new findings: on the whole charter 
school students are performing similar or 
slightly worse than comparable students in 
traditional public schools. This finding has 
not changed over time, nor has it changed as 
the body of evidence has expanded to in- 
clude more states and more rigorous studies. 
The findings from the CREDO study simply 
strengthen this overall conclusion. 

V. Review of the Report’s 
Methods 

The CREDO study undertook an incredibly 
ambitious goal of securing data-sharing 
agreements with 16 charter- school states to 
obtain student-level data that could be longi- 
tudinally linked. However, the researchers’ 
next task, developing a common data struc- 
ture necessary for merging data and per- 
forming analyses, was potentially more 
problematic than the first. The complexity of 
this task warrants specific details as to how 
the common data structure was constructed. 
Unfortunately, these details were not pro- 
vided, even in the technical appendix. In ad- 
dition, the report shows a considerable lack 
of clarity regarding the intended unit of 
analysis. Given the title, and given much of 
the discussion of findings, we expected the 
unit of analysis to be charter schools, but all 
of the analyses presented in the report are 
based on student-level data. 

Four Technical Concerns 

Notwithstanding our judgment that this re- 
port offers a great deal of useful information 
and analyses, we observed four issues with 
the analysis that should be noted. All four 
concern methodological choices and report- 
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ing of information, so we ask for patience 
from readers not well-versed in assessment 
and statistics. 

Construction of a Meaningful 
Outcome Variable 

In the process of creating a dependent vari- 
able that represents a valid presentation of 
student achievement across 16 (or 15) states 
CREDO researchers appear to have con- 
fused score standardization and test equat- 
ing. In constructing the dependent variable, 
these researchers have, at a minimum, as- 
sumed that test equating is not needed to 
form a valid dependent variable. In fact, the 
CREDO researchers simply projected the 
student performance from each different 
state achievement test onto a common scale 
and then interpreted all the scores on this 
scale, as if the scores all mean the same 
thing. This represents a common error in 
psychometric score interpretation. 

The Virtual Twin as a Match 

The “virtual twin”^^ matching method 
CREDO created is similar to a matching 
technique employed by the two authors of 
this review in their state evaluation of char- 
ter schools in Delaware. Similar to that 
evaluation, exact matches were established 
for students in a charter school. Unfortu- 
nately, it is not clear in either the report or 
its technical appendix why the previous test 
score (t-1) was matched by +0.1 range and 
not by a propensity method, an increasingly 
common matching method for continuous 
response variables. Eurthermore, exact 
matching efficiency is a function of the size 
of the feeder school population, and the 
CREDO report did report the overall per- 
centage match at the level of the state. Un- 
fortunately these researchers did not provide 
the accuracy of the matching procedure at 
the level which it was preformed (e.g., the 



oo 

feeder school level). 

Regarding the suitability of the feeder 
school logic, CREDO researchers could 
have assembled all of the virtual twins into a 
“virtual school” and examined the compara- 
bility of the virtual schools to all of the sur- 
rounding feeder schools. In a weak way this 
can give an indication of the extent of selec- 
tion bias inherent in the charter school popu- 
lation that has been captured by the match- 
ing procedure. If the virtual school looks 
similar to the feeder schools, then (as meas- 
ured by the matching variables) there would 
not be a substantial amount of selection bias 
operating.^^ Lastly, it is important to re- 
member that matching students does not 
create matched schools. There is more to a 
school than its students, and comparing stu- 
dents is not equivalent to comparing 
schools. 

Primary Analysis: OLS Regression 
with Robust Standard Errors 

To address the known and probably nontriv- 
ial intra-class correlation among students 
nested in the same school and schools nested 
in states, CREDO researchers used robust 
standard errors on ordinary least squares 
(OLS) estimates. While this technique repre- 
sents a vast improvement over conventional 
OLS standard errors, there is considerable 
detail omitted that a technically based reader 
would want in order to understand how this 
was implemented.^^ 

Year-to-Year Gain Scores 

The primary findings presented in the 
CREDO report are based on an average 
year-to-year gain score expressed in stan- 
dard deviation units. Although the technique 
used by CREDO researchers is not unusual, 
it does not reflect a true longitudinal growth 
of individual student achievement. Rather it 
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represents average growth of a group. 

This again reflects possible confusion of the 
unit of analysis. If the group remains rela- 
tively stable, then weak inferences can be 
derived about tbe group’s relative improve- 
ment or decline in achievement. However, if 
the stability of the group changes, then even 
weak inferences about group gains or losses 
cannot be made. This is because changes in 
the group become confounded with simulta- 
neous changes in time. Unfortunately, there 
was likely instability in the “virtual twin” 
group that can only weaken the inferences 
drawn from these analyses. Moreover, the 
validity of such an approximation of longi- 
tudinal gain is stretched to the limits if even 
a small sub-sample of schools uses year- 
around-schooling. Thus, it is not clear — and 
accordingly, justification is needed to under- 
stand — why consecutive year-to-year gain 
scores were averaged. 

VI. Review of the Validity of the 
Findings And Conclusions 

Overall the findings reported by CREDO 
parallel the increasing body of research re- 
lated to the impact of charter schools on stu- 
dent achievement. As previously presented, 
there is much variation among these studies. 
This report adds to that variation by creating 
a multi-state, student-level matched design. 
Additionally, between three and eight years 
of data were pooled, depending on the state. 
Thus the potential for the CREDO report to 
inform researchers and policymakers on the 
effects of charter school attendance on stu- 
dent achievement was promising. Unfortu- 
nately, there are a number of methodological 
limitations that weaken the CREDO conclu- 
sions. One notable example is that the report 
includes little explanation for the use of av- 
eraged gain scores when true longitudinal 
growth could have been examined. More 
generally, it would have been very helpful 



for the report or its technical appendix to 
have included more details and justification 
for decisions regarding study design and 
methods of analysis. 

Eurther, we see four notable threats to the 
validity of the report’s core findings. Eirst, 
the CREDO analyses are grounded in the 
assumption that the clustering effects of 
state and school were minimal and were ef- 
fectively attenuated by the use of robust 
standard errors. Yet there is no assurance 
that the use of robust standard errors ade- 
quately accommodates the nested structure 
of the data. Unfortunately, the method used 
for matching precluded use of hierarchical 
linear modeling (HEM), which may have 
been more appropriate for these data. 

Second, the analyses are also grounded in 
the assumption that averaged groups’ gain 
scores represent the longitudinal growth tra- 
jectories of individual students. This is a 
weak proxy for individual student longitudi- 
nal growth curves. Students should be 
nested in schools and, in a multi-state data 
set, schools should be nested in states. Eor 
example, there are notable differences in 
state chartering laws, such as caps that have 
nontrivial impact on the ability of schools to 
affect student achievement, as argued in this 
report. In the present CREDO analysis, this 
natural nesting of data is ignored. As a con- 
sequence, the degrees of freedom in the 
model hypotheses tests are extremely large, 
which makes even trivial differences “statis- 
tically significant.” 

This brings us to the third point regarding 
the validity of the findings. As can be seen 
from the numerous tables presenting the re- 
sults of the OES regression models, the 
sample sizes were extremely large (>1.7 
million). Although there were often more 
than 40 predictor variables, the sheer size of 
the sample ensured the mean square error 
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would have more than 1.5 million degrees of 
freedom. Thus, every predictor will be sta- 
tistically significant. In other words, declar- 
ing an effect as statistically significant when 
they all are, or will be; regardless of the ef- 
fect size, may be overstating the findings. 
Many of the graphs present extremely small 
differences in achievement (e.g., .01 and .03 
SD units; p. 22) between charter schools and 
traditional public schools, representing a 
.0039 and .0119 percentile difference if one 
assumes a normal distribution. The CREDO 
researchers should revisit the practical sig- 
nificance of their findings, tempered by the 
numerous assumptions and limitations in- 
herent in their methods. 

Our last concern focuses on the multivariate 
nature of the two dependent variables (read- 
ing results and math results). It cannot be as- 
sumed that performance on reading and 
mathematics assessments are not corre- 
lated,so any analysis of reading is partially an 
analysis of math (and vice versa), unless each 
is partialled from the other.^^ This could eas- 
ily have been done in the CREDO study by 
simply adding the non-dependent variable 
into the model as a predictor. Eor example, if 
math were the dependent variable, then the 
first predictor would be reading score. This 
would effectively partial out the covariance 
between math and reading, so the effects of 
the remaining predictor variables could be 
interpreted for their ability to explain varia- 
tion among averaged math gain scores. 

We offer one final note of concern to readers 
attempting to ascertain the validity of the 
report’s key conclusions. The report’s tech- 
nical appendix includes results from several 
OLS models, presented in tables. In many 
cases the predictor variables can be under- 
stood (e.g., “is English Learner, ” p. 4), but 
for other variables it is not clear how they 
were coded. Eor example, the title of the re- 
port indicates an analysis of 16 states, yet 



only 15 states are represented in the OLS 
models (Massachusetts is missing). Thus, 
one is tempted to interpret this as the refer- 
ent state where the remaining 15 states are 
dummy coded against it. If so, what is the 
justification for choosing Massachusetts as 
the referent? This explanation should be in- 
cluded in the report or its technical appen- 
dix. Moreover, Grade05 is omitted in the 
OLS models, suggesting that this was the 
referent group for grade. But again, no ex- 
planation is provided. 

Effect of Charter School Policy 
on Performance 

In a section of the CREDO report titled “Ef- 
fects of State Charter School Policy on Per- 
formance,” the researchers explore the rela- 
tionship between their state-level findings 
and three policy issues: (i) presence of caps 
on the number of charter schools, (ii) exis- 
tence of multiple authorizers, and (iii) exis- 
tence of an appeal process. The researchers 
concluded that states had significantly lower 
achievement growth if: (a) they had caps on 
the number of charter schools that could op- 
erate, or (b) they had multiple authorizers. 
Evidence presented in the report also sug- 
gested that states with an appeal process to 
overturn denied charter applications had a 
small but significant advantage in terms of 
growth in student achievement. 

However, given the manner in which states 
define caps, and given the wide range of 
practices when it comes to enforcing those 
caps, any analysis of this variable cannot be 
expected to reflect valid findings. The 
same is true of appeals, given the wide range 
of practices when it comes to using the ap- 
peal process. Even the issue of multiple au- 
thorizers is clouded by distinct variations in 
the activity of those authorizers. Rather than 
look at how a state’s charter school legisla- 
tion is worded, it would be more useful to 
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base an analysis on how the legislation is 
being interpreted and implemented. Caps, 
multiple authorizers and the existence of an 
appeals process are all viewed as compo- 
nents of less restrictive charter school laws 
that would spur the opening of larger num- 
bers of charter schools. (And note that the 
purported effects of the three factors here 
are inconsistent in this regard.) A better way 
to view whether a charter school reform is 
restrictive or permissive is to look at the 
number of charter schools in operation as 
well as the relative speed with which the re- 
form has grown over time. 

We created Table 1 to sort the CREDO 
states into three groups, depending on 
whether their charter schools students had 
(a) significant positive growth relative to 
their virtual twins in traditional public 
school, (h) significant negative growth rela- 
tive to their virtual twins, or (c) no differ- 
ence in growth relative to their virtual twins. 
The states in the positive group have rela- 
tively fewer charter schools, and many of 
the largest charter schools states had nega- 
tive findings. 

The group of more successful charter school 
states has only 61.6 schools on average, 
while the unsuccessful states have 275 char- 
ter schools on average. The ranking next to 
each state name indicates the relative rank of 
the state in terms of the number of charter 
schools operating in 2008. There are 41 
charter school laws; thus, the ranking runs 
from the largest charter school state, Cali- 
fornia, to the smallest charter school state, 
Mississippi, ranked #41 with only one char- 
ter school. 

The relationship between the size of a state’s 
charter school reform and its relative per- 
formance, as measured hy the CREDO re- 
searchers, is striking. It suggests that states 
with fewer charter schools are better able to 



Table 1. 

Number of Charter Schools per State 

Total CS Group Group 

in 2008 Mean Medium 



States with positive Charter school findings 



Arkansas 



#33 



18 



Colorado 



#9 



140] 



Illinois 



#61 



Louisiana 



#21 



60 

ESI I 



Missouri 



#23 



36 



subgroup total 


195 


61.6 


54.0 



States with negative charter school findings 



Arizona 



#2 



Florida 



#3 



348 




Minnesota #8 1 1 ^4^ 



New Mexico 


#15 


66 






Ohio 


#5 


295 






Texas 


#4 


314 






subgroup total 




1650 


275.0 


304.5 



States with no significant differences 



California #1 |[~*703j 

District of 74 

Columbia #14 



II 



Georgia 



#16 



65 



North Carolina #1 1 103 



subgroup total 


945 


236.3 


88.5 



Note: The actual number of charter schools for Colo- 
rado and Illinois considered in the study include only 
those schools in Denver and Chicago, respectively. 



Source: National Center for Education Statistics, Ta- 
ble 4.3 Charter school legislation by state: 2008. 



oversee those schools and hold them ac- 
countable. In fact, the existence of caps in 
many states has been an instrument to exert 
pressure on authorizers to close poor- 
performing charter schools in order to create 
places for new applicant groups. 

Related to overall size of the state charter 
school reforms is the rate or speed of 
growth. A sensible way to implement any 
new school reform is to begin implementa- 
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tion on a small scale, to ensure that 
things are working as anticipated.^^ It is 
not surprising, then, that the states in the 
low-performing group, with the possible 
exception of Minnesota, ramped up their 
charter school numbers very quickly. 
Therefore, even states like Ohio that had 
good measures of accountability written 
into its law found that it was not practi- 
cally possible to track and oversee the 
plethora of schools that were being 
launched. 

Another factor that the CREDO researchers 
might consider is the relationship between 
state outcomes and the extent to which the 
charter schools are operated by private edu- 
cation management companies (EMOs). Ta- 
ble 2 illustrates this relationship. As is ap- 
parent, the poor-performing states had 21% 
of their schools run by for-profit EMOs, 
while the high-performing states only had 
13.7% run by these for-profits. Interestingly, 
the successful states did have slightly more 
schools run by nonprofit EMOs (20.5% 
compared to 18.2% for the low performing 
states). The proportion of nonprofit EMOs 
among the high-performing states was par- 
ticularly bolstered by Illinois (71.6% of all 
charters), where nonprofit EMOs are being 
used as a means of scaling up or expanding 
the number of high-performing charter 
schools. 

Aside from looking at the actual size of the 
state charter school reforms and the preva- 
lence of EMOs, other important variables 
that could be considered in further analyses 
by the CREDO researchers include the rigor 
of the application process and the rigor of 
oversight. A framework for ranking charter 
school laws developed by Chi and Weiner 
(2008)^^ and an AERA paper by Miron 

OQ 

(2005) both outline a number of variables 
related to strong charters schools, as op- 
posed to permissive charter school laws. 



Table 2. 

Proportion of Charter Schools Operated 
by For-Profit or Nonprofit EMOs 





For Profit 
EMOs 


Nonprofit 

EMOs 


States with positive charter school findings 


Arkansas 


11.1% 


27.8% 


Colorado 


10.7% 


1.4% 


Illinois 


12.2% 


71.6% 


Louisiana 


1A% 


9.3% 


Missouri 


38.9% 


2.8% 


subgroup total 


13.7% 


20.5% 


States with negative charter school findings 


Arizona 


22.3% 


21.5% 


Florida 


36.2% 


2.9% 


Minnesota 


4.1% 


0.7% 


New Mexico 


0.0% 


0.0% 


Ohio 


31.5% 


22.4% 


Texas 


4.8% 


38.5% 


subgroup total 


21.0% 


18.2% 


States with no significant differences 


California 


2.4% 


15.9% 


District of 
Columbia 


10.8% 


27.0% 


Georgia 


12.3% 


3.1% 


North Carolina 


4.9% 


1.9% 


subgroup total 


4.0% 


14.4% 



Source: The data for this table are derived from the na- 
tional profile reports of For Profit and Nonprofit EMOs: 
http://epicpolicv.org/bv-topic/puhlications/732 . 



VII. Usefulness of the Report for 
Guidance of Policy 
AND Practice 

The CREDO researchers noted that this re- 
port would be followed by two additional 
reports. The relative strength and compre- 
hensiveness of their data set, as well as the 
relatively solid analytic approaches of the 
researchers, makes this first report a useful 
contribution to the charter school research 
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base. The two future reports should add fur- 
ther information that will be useful for poli- 
cymakers. This review does point to some 
weaknesses and areas for improvement, 
many of which represent limitations — not 
outside the range of limitations that are in- 



herent in other studies on student achieve- 
ment in charter schools — that should be 
shared in the technical report. The review 
offers suggestions that we hope will help 
improve the subsequent reports produced by 
CREDO based on this multi-state data set. 
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Por example, if a student scored 640 on the SAT (assuming a test mean of 500 and a SD of 100) 
that student’s z-score would be 1.40. If a second student took the ACT and received a 27 
composite (assuming a test mean of 20 and SD of 5) their z-score would also be 1.40. Al- 
though the z-scores are the same, few people would argue that each person would receive 
the same z-score if each took the other test. All the z-score transformation has done is place 
the student in a relative position to the “group.” In this case, it is not even clear which group 
(within a state) was used to form the z-score transform. 

Page 2 of the technical appendix includes a footnote that Indicates the z-score transform may have 
pooled either grades or subject matter. We can see no defensible reason that the z-score 
transform be calculated from any pooled group - pooling over years, grades or subject tests. 
That is, it is only meaningful (given the two stated assumptions) if calculated within year, 
within grade, and within subject content. Moreover, unless the “group” distributions are ap- 
proximately normal, the equal z-scores (1.40) actually mean very different things. Said an- 
other way, converting raw scores to z-scores does not change the shape of the group distri- 
bution; it just forces the mean to be 0.0 and the SD to be 1 .0. So to believe that the conver- 
sion of different state achievement test scores to a z-score scale provides a common score 
scale for interpretation requires two major assumptions. First, the content of the different 
tests must have substantial overlap and this must be true across all grade levels of all as- 
sessments. Second, the distribution of scores for all assessments, at all grades, in all years 
must be approximately the same. Unfortunately, the former assumption would be difficult 
(although possible) to validate, and while the latter assumption could be easily empirically 
validated, no information was provided in the technical appendix regarding the distributions 
of scores to allow readers to do so. 

Regarding the assumption of substantial content overlap, we know of no comprehensive research 

that has examined the content overlap of the different state achievement tests. Moreover, in 
many states the purpose of the test changes as a function of the assessment grade. For in- 
stance, in Michigan the MEAP, a basic skills test, is administered before high school. But In 
high school the Michigan Merit Exam is administered, and it is more of an aptitude test 
than a basic skills test. 

For all these reasons, the validity of the standardizing method used in the CREDO report is less than 
perfect and may undermine its conclusions. Unfortunately, CREDO researchers merely 
dismiss these issues in the statement “minor differences may remain after these adjust- 
ments” (pg. 13). 

The choice of term “twin” CREDO used to describe the matched case for each TPS student is 
potentially problematic, since there is a great deal of research activity focused on the study 
of familial twins. Both research methods and statistical analyses are well-established for 
that twin research. In fact, proper statistical analyses in that area must account for the clus- 
tered nature of the twin data. Since the new study’s use of the word “twin” does not refer to 
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that twin research (or use those methods), it would be helpful to explicitly tell readers that 
the use of the term is unique to that study. 

Miron, G., Cullen, A., Applegate, B., & Farrell, P. (2007). Evaluation of the Delaware charter 
school reform: Final Report. Dover, DE. Delaware State Board of Education. Retrieved 
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28 

It is not completely clear how student mobility is addressed. It appears that even if the student 
attending a charter school remains in the same charter school they could have more than 
one virtual twin if the first virtual twin moves to a charter school. 

29 

Flowever, if the virtual school were substantially different from the feeder schools then the match- 
ing procedure has adjusted the characteristics of the virtual school to better match the char- 
ter school characteristics and the difference between the two (feeder schools and the virtual 
school) may represent some of the hypothesized selection bias. So it remains unclear how 
effective the matching procedure was in reducing the unknown but nonzero selection bias 
in the charter school students. 

30 

For example, it is not clear if the CREDO researchers simply used an asymptotic covariance ma- 
trix of the estimates (thus accounting for heteroscedasticity present in the data) or took a 
more structured approach that would directly account for the clustering effects. We suspect 
the former since the clustering effect, while present in the virtual twin students, cannot be 
clearly represented by their school since they are without school, but the charter school stu- 
dents are with school. Moreover, clustering at the level of the State should have been better 
addressed in the CREDO analyses. 

31 

There are several design approaches CREDO could have used with such a large database, includ- 
ing the use of the pre-test values with a regression discontinuity analysis. As stated earlier, 
a HEM might be most appropriate, and perhaps a matched-pairs design (which is used in 
true twins research) may have worked better. We suspect that matching on the pre-test val- 
ues may have created some difficulties (hence the wide matching range), thus a propensity 
match could have been utilized or the researchers might even have used prior test scores as 
a covariate. 

32 

Given the access to linked individual student records, we wonder why student growth modeling 
was not done, with school type (charter or traditional public) as a level-2 (or level-3) pre- 
dictor depending on other covariates in the model. This type of analysis directly addresses 
what the CREDO researchers note to be the primary question of the evaluation: “. . .a cur- 
rent and comprehensible analysis about how well [charter schools] do educating their stu- 
dents” (pg. 1). If the focus of the study is on examining the “effects” of charter schools on 
student achievement then student achievement should be modeled within the nested struc- 
ture of the data. This would appropriately account for the intra-class correlation present at 
each level of the analysis. 
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For example, if reading and math were correlated at .7, this would mean that about 50% of the 
variance in reading is indistinguishable with math and vise-a-versa. In that case, a treatment 
that changes math scores would necessarily also affect about 50% of the variance in reading 
scores. It would be easy — and wrong — to then attribute the change in reading to be ‘caused’ 
by the treatment when, in fact, the change in reading scores reflects the shared variance 
with math scores. An unconfounded picture of reading (or math) can be obtained by includ- 
ing the non-dependent variable in the model as a predictor. 

34 

Some states do not have caps on the numbers of charter schools but they do have restrictions on 
where charter schools can operate. The CREDO researchers did try to account to account 
for the manner in which the caps impacted the number of charter schools, but this attempt 
still could not capture the different ways in which caps are interpreted and eventually influ- 
ence the numbers of charter schools. 
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